Person name recognition and linking from overlay text in TV broadcast shows

نویسندگان

  • Géraldine Damnati
  • Benoît Favre
  • Frédéric Béchet
  • Delphine Charlet
چکیده

Identifying people in video broadcast is by nature a multimodal task: persons can be identified thanks to biometric information (face or voice), or thanks to a reference to their identity in the overlaid text or the speech content. In the framework of the French evaluation program Repere, this paper presents a method for identifying speakers in videos without any a-priori models, based only on overlaid text often used to introduce guests or journalists occurring for the first time in a given TV show. We show that Entity Linking improves speaker identification performance by reducing ambiguities in OCR transcriptions and allowing to add biometric constraints in the multimodal fusion process. All the methods presented are evaluated on the Repere video corpus of broadcast shows from 2 French TV channels and 5 different shows (news, talk shows, magazine).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through vide...

متن کامل

UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task

This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recogniti...

متن کامل

Automatic transcription error recovery for Person Name Recognition

Person Name Recognition from transcriptions of TV shows spoken content is a crucial step towards multimedia document indexing. Recognizing Person Names implies the combination of three main modules: Automatic Speech Recognition, NamedEntity Recognition and Entity Linking to associate the recognized surface form to a normalized Person Name. The three modules are potentially error prone. Hence, b...

متن کامل

Multimodal Person Discovery in Broadcast TV at MediaEval 2015

We describe the“Multimodal Person Discovery in Broadcast TV” task of MediaEval 2015 benchmarking initiative. Participants were asked to return the names of people who can be both seen as well as heard in every shot of a collection of videos. The list of people was not known a priori and their names had to be discovered in an unsupervised way from media content using text overlay or speech trans...

متن کامل

Multimodal Person Discovery in Broadcast TV at MediaEval 2016

We describe the“Multimodal Person Discovery in Broadcast TV” task of MediaEval 2016 benchmarking initiative. Participants are asked to return the names of people who can be both seen as well as heard in every shot of a collection of videos. The list of people is not known a priori and their names has to be discovered in an unsupervised way from media content using text overlay or speech transcr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014